1 Summary

In this study we analyzed germline and somatic variant calls of JHU Biobank samples to explore the genomic landscape of benign and malignant tumors in people with Neurofibromatosis type 1 (NF1). While the analysis is still being refined, we identified a list of top 100 genes that were found to carry high impact variants in PNF and MPNST samples. We also explored variants in many genes of interest which represent important cellular pathways that have been implicated to be dysfunctional in plexiform neurofibromas (PNFs) and malignant peripheral nerve sheath tumors (MPNSTs). Additionally we explored the variants in Triad samples (i.e. patients with both benign and malignant samples) and identified a list of top 100 genes with high impact somatic variants in PNF and MPNST samples.

 

2 Methods

2.1 Genomic Variant Calling

Raw fastq data files were quality checked using FastQC v0.11.9 and a report was generated using MultiQC v1.8. Fastq files were aligned to GRCh38 using BWA 0.7.17. Duplicates were marked using GATK MarkDuplicates, and bases recalibrated using GATK BaseRecalibrator and GATK ApplyBQSR (GATK v4.1.7.0). Genomic variants (a mix of germline, somatic, and de-novo variants) were then called using Google’s DeepVariant software (deepvariant v1.1.0). DeepVariant calls variants in two steps. In the first step (make_examples) a human-written heuristic identifies positions that are potentially variants and creates pileup examples of them. In the second step (call_variants), a neural network classifies whether the identified positions are real variants or not and genotypes them. The identified variants were annotated using Variant Effect Predictor (VEP v99.2) and converted to MAF files using vcf2maf (vcf2maf v1.6.21). All of these steps were completed on Nextflow Tower running the standardized nf-core pipeline sarek v2.7.1.

2.2 Somatic Variant Calling

After the initial alignment and base recalibration, somatic variants were called using GATK Mutect2 software (GATK v4.1.7.0). The variants were annotated using Variant Effect Predictor (VEP v99.2) and converted to MAF files using vcf2maf (vcf2maf v1.6.21). All of these steps were completed on Nextflow Tower running the standardized nf-core pipeline sarek v2.7.1.

 

3 Results

3.1 Sample Summary

The samples analyzed here include samples sequenced in batch 1, batch 2, and batch 3 of JHU Biobank. Batch 1 was fully sequenced in JHU sequencing core. Batch 2 and Batch 3 samples were sequenced in WUSTL.

 

 

3.2 All variants (germline, somatic, and de novo variants):

3.2.1 Summary of unfiltered variants

Variant calls for germline, somatic, and de-novo mutations using DeepVariant were visualized in PNF and MPNST samples without any prefilter steps. The samples include tumor samples from 3 different sequencing batches.

3.2.1.1 Note:

All variants regardless of whether the variant caller denoted them as high confidence calls or low confidence calls have been included in unfiltered_variants. It should be noted that these include potential false positives. This was done to explore the full breadth of all called variants before additional filters are imposed.

Figure 1 is a summary overview of all unfiltered variants identified in the samples.

Figure 1

 

Figure 2 below shows oncoplots of all unfiltered variants identified in our list of genes of interest.

Figure 2

 

3.2.2 Filtered variants in PNF and MPNST samples:

The germline, somatic, and de-novo variant calls were then filtered to exclude common variants and variants with potentially low or medium deleterious consequences.

3.2.2.1 Filter criteria

  • All variants that had values of “RefCall”, “common_variant”, or “RefCall;common_variant” in the FILTER column were excluded. This excludes any variants that are deemed common_variant due to gnomAD_AF >= 0.0005 or are low confidence variant calls. A RefCall entry occurs only in Deepvariant output files when a candidate variant is proposed and then is specifically rejected as non-variant.

  • Additionally all variants that had values of “MODERATE” or “MODIFIER” in the IMPACT column were excluded.

  • Only variants that had “PASS” in FILTER column and “HIGH” in IMPACT column were included in the analyses below.

 

Figure 3 below shows filtered variants in our list of genes of interest in PNF and MPNST samples. Based on the filters above, the plot below shows a mix of somatic and de-novo variants called by DeepVariant.

Figure 3

 

3.2.3 Filtered variants in Triad Samples:

Now we specifically choose the patients who provided samples for normal, benign, and malignant tissue. These set of samples are called “TRIADS”. The patients with triad samples are: “JH-2-002”, “JH-2-015”, “JH-2-016”, “JH-2-023”, “JH-2-031”, “JH-2-045”, “JH-2-055”, “JH-2-084”.

We have 5 PNF samples and 11 MPNST samples from TRIAD patients. One thing to note is that there are more number of MPNST samples than PNF samples. This is mainly because 1) Some patients with MPNST had multiple samples sequenced, and all of them are represented in the plots, 2) Some patients with MPNST had a benign form of NF1 tumor other than PNF (e.g. cNF or ANF).

Figure 4 below shows filtered somatic and de-novo variants in top 100 genes in the triad samples.

Figure 4

 

Figure 5 below shows filtered variants in the triads in our list of genes of interest.

Figure 5

 

 

3.3 Somatic Variants

3.3.1 Summary of unfiltered variants

Since we had access to tumor-normal paired samples, we also used a somatic variant caller called Mutect2 to detct somatic variants in the tumor samples. Somatic calls from Mutect2 were visualized in the PNF and MPNST samples without any prefilter steps. The samples include tumor samples from 3 different sequencing batches.

3.3.1.1 Note:

All variants regardless of whether the variant caller denoted them as high confidence calls or low confidence calls have been included in unfiltered_variants. It should be noted that these include potential false positives.

Figure 6 is a summary overview of all unfiltered variants identified in the samples :

Figure 6

 

Figure 7 shows an oncoplot of all unfiltered variants identified in our list of genes of interest in the pNF and MPNST samples.

Figure 7

 

3.3.2 Filtered variants in PNF and MPNST samples:

The somatic variant calls were then filtered to exclude common variants and variants with potentially low or medium impact consequences.

3.3.2.1 Filter criteria

  • All variants that had values of “common_variant” in the FILTER column were excluded. This excludes any variants that are deemed common_variant due to gnomAD_AF >= 0.0005 or are low confidence variant calls.

  • Additionally all variants that had values of “MODERATE” or “MODIFIER” in the IMPACT column were excluded.

  • Only variants that had “.” in FILTER column and “HIGH” in IMPACT column were included in the analyses below.

 

Figure 8 shows the top genes with somatic variants in PNF and MPNST samples after filtering out any common variants or variants that have low or medium impact.

Figure 8

 

Figure 9 shows filtered somatic variants in PNF and MPNST samples in our list of genes of interest. These results show the following:

  • Not all PNF or MPNST samples show the presence of single nucleotide variants in the NF1 gene. There may be two reasons for this : a) Samples may contain microdeletions or copy number variations in NF1 gene which would not be detected in this analysis, b) Samples may have lower tumor purity resulting in low detection range for NF1 variants.

  • 30% of the MPNST samples show variants in SUZ12 gene, a known gene affected in MPNST samples.

 

Figure 9

 

3.3.3 Filtered somatic variants in Triad Samples:

Now we specifically choose the patients who provided samples for normal, benign, and malignant tissue. These set of samples are called “TRIADS”. The patients with triad samples are: “JH-2-002”, “JH-2-015”, “JH-2-016”, “JH-2-023”, “JH-2-031”, “JH-2-045”, “JH-2-055”, “JH-2-084”.

 

Figure 10 shows top 100 genes with filtered somatic variants in PNF and MPNST samples. We note that NF1 is among the top 100 genes with impactful somatic variants.

 

Figure 10

 

Figure 11 shows filtered variants in our list of genes of interest in the triad samples.

Like before we note that many of the samples do not show any variants in NF1 gene. The Biobank is currently looking at tumor purity information for these samples to rule out any purity related issues.

 

Figure 11